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Abstract 

Let {-'^n} be a stationary and ergodic time series taking values 
from a finite or countably infinite set X. Assume that the distribu- 
tion of the process is otherwise unknown. We propose a sequence of 

^^ ' stopping times A„ along which we will be able to estimate the con- 

Q ■ ditional probability P(Aa„+i = x\Xo, . . . ,Xx„) from data segment 

{Xq , . . . , X\^ ) in a pointwise consistent way for a restricted class of 
stationary and ergodic finite or countably infinite alphabet time se- 

LT* ■ ries which includes among others all stationary and ergodic finitarily 

Markovian processes. If the stationary and ergodic process turns out 
to be finitarily Markovian (among others, all stationary and ergodic 

^ ' Markov chains are included in this class) then lim„_^oo x~ > almost 

surely. If the stationary and ergodic process turns out to possess fi- 
nite entropy rate then A„ is upperbounded by a polynomial, eventually 
almost surely. 
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1 Introduction 

Bailey [Ij and Ryabko [14j considered the problem of estimating the con- 
ditional probability P{Xn+i = l\Xo,...,Xn) for binary time series. They 
showed that one cannot estimate this quantity from the data (Xq, . . . ,Xn) 
such that the difference tends to zero almost surely as n increases, for all 
stationary and ergodic binary time series. 

It is well known, that if one knows in advance that the process is Markov 
with arbitrary (unknown) order, then one can estimate the order (c.f. Csiszar 
and Shields [4J, Csiszar [5]), and using this estimate for the order, one can 
count empirical averages of blocks with lengths one plus the order for esti- 
mating P{Xn+i = ^Xq, . . . , Xn) in a pointwise consistent way. In the present 
paper we will consider the case when it is not known in advance if the process 
is Markov or not. 

Morvai [11] exhibited a sequence of stopping times rjn such that P(X^^+i = 
l|Xo, . . . , X^„) can be estimated from data segment (Xq, . . . , X^^) in a point- 
wise consistent way, that is, the error vanishes as n increases. The disadvan- 
tage of that scheme is that the stopping times grow very fast. Another, more 
reasonable scheme was proposed by Morvai and Weiss |12] for a subclass of 
stationary and ergodic binary time series. There the stopping times still grow 
exponentially, though not so fast as in Morvai [TTj . 

Bailey fT] proved that there is no test for the Markov property, that is, 
there is no algorithm which could tell you eventually if the process is Markov 
with any order or not, over all stationary and ergodic binary time series. 

In this paper discrete (finite or countably infinite) alphabet stationary 
and ergodic processes are treated. We propose a much denser (compared to 
Morvai and Weiss p[2]) sequence of stopping times A„ along which we will 
be able to estimate P(Xa„+i = x\Xq, . . . , Xa„) from samples (Xq, . . . , Xa,J 
in a pointwise consistent way for those processes whose conditional distri- 
bution is almost surely continuous (see the precise definition below). This 
class includes all Markov processes with arbitrary order and the much wider 
class of finitarily Markovian processes. Despite Bailey's result, for the pro- 
posed stopping times A„, if the stationary and ergodic process turns out to 
be finitarily Markovian (which includes all stationary and ergodic Markov 
chains with arbitrary order) then lim^^oo x~ > almost surely. If the sta- 
tionary and ergodic process turns out to possess finite entropy rate then A„ 
is upperbounded by a polynomial, eventually almost surely. 



2 The Proposed Algorithm 

Let {-^n}^_oo be a stationary and ergodic time series taking values from a 
discrete (finite or countably infinite) alphabet X. (Note that all stationary 
time series {Xn}'^=Q can be thought to be a two sided time series, that is, 
{X„}5^_^. ) For notational convenience, let X^ = {X^, ■ ■ ■ ,Xn), where 
m < n. Note that ii m > n then X^ is the empty string. 

For A; > 1, let 1 < /fc < A; be a nondecreasing unbounded sequence of integers, 
that is, 1 = li < I2 ■ ■ ■ and linifc^oo 4 = 00. 

Define auxiliary stopping times ( similarly to Morvai and Weiss [12]) as fol- 
lows. Set (o = 0. For n = 1, 2, . . ., let 

Cn = Cn-1 + min{t > : X^-J^,,^^, = 4::^(,._i)}. (1) 

Among other things, using (n and /„ we can define a very useful process 
{Xn}n=-oo as a function of X,^ as follows. Let J{n) = min{j > 1 : Zj+i > n} 
and define 

X_. = Xc,,,_. for z > 0. (2) 

As we will see in the proof of the Theorem, the {X}^^_^ has the same 
distribution as the original process. For notational convenience let Pk{x^-k) 
and Pk{.y\x^-k) denote the distribution P(X°^ = x^_f,) and the conditional 
distribution P{Xi = 2/|X°^ = x'^f.), respectively. 

Definition 1. For a stationary time series {X„} the (random) length K{X^^) 
of the memory of the sample path X^^ is the smallest possible < i^ < 00 
such that for alH > 1, all y G A*, all zZx-i+i ^ ^^ 

PK-l{y\X°-K+l) = PK+i-liy\zZK-i+l, X°_K+l) 

provided px+iizZx-i+ii X^j^_^_i, y) > 0, and K{X^^) = 00 if there is no such 
K. 

Definition 2. The stationary time series {X„} is said to be finitarily Marko- 
vian if K^X'^^) is finite (though not necessarily bounded) almost surely. 

In order to estimate K{X^^) we need to define some explicit statistics. 



Define 



sup sup 



Pfc-i(^l^-fe+i)-Pfc+i-i(a^l(^-fc-m'-^-fc+i)) 



S -1 



We will divide the data segment Xq into two parts: Xq ^ and X'^n-^ . Let 

^nl den 
That is, 



(1) [-1-1 

£„ ^ denote the set of strings with length k + 1 which appear at all in Xq ^ 



.71- 



£«, = {x'_, e X'^' :3k<t<q]-l: X^ = x'_,}. 

(2) 

For a fixed < 7 < 1 let £„ ^ denote the set of strings with length k + 1 
which appear more than n^~'^ times in ^pun • That is, 

41 = {x'-k e X'^' : #{ r^l + A; < t < n : X*_, = x\} > n'-^}. 



Let 



^k ^n,k\ \''-n,k' 



We define the empirical version of A^ as follows: 



A^(^°fc+i) 



max max l|A„.,<rii-|_ij 

*■ — fc — l + l' — fc + 1' -* fc + I 



#{rtl+A;<t<n:X*_, = (XVi,:^)} 



#{[11 + A; - 1 < t < n - 1 : Xt,+i = XO,+J 



#{[11 +A; + 2<t<n:X, 



t—k—i 



l^-fc-j+l' ^-fc+U ^)l 



#{rfl+A; + i-l<t<n-l:Xt 



k-i+l 



'-k~i 



( + l!^-fc + l)l 



Note that the cut off l{^^,^,<p]_i} ensures that X^j^j^^ is defined from X, 
Observe, that by ergodicity, for any fixed /c, 

liminf A^ > A^ almost surely. 



rti-i 



(3) 



We define an estimate Xn for K{X^^) from samples Xq as follows. Let 
< /9 < i^ be arbitrary. Set Xo = 0, and for n > 1 let Xn be the smallest 
Q < kn < n such that A^ < n~^. 



Observe that if Q < [|] — 1 < Q+i then Xn < Ij+i- 

Here the idea is (cf. the proof of the Theorem) that if K{X^^) < oo then 
Xn will be equal to K{X^^) eventually and if K{X^^) = oo then %„ ^ oo. 

Now we define the sequence of stopping times A„ along which we will be able 
to estimate. Set Aq = Co, and for n > 1 if Q < A„_i < Cj+i then put 

A„ = min{t > A„_i : X*_^^+i = xg_^^+J (4) 

and 

Observe that if Q < An-i < Cj+i then Q < A„_i < A„ < Q+i- If Xa„_i+i = 
then A„ = A„_i + 1. Note that A„ is a stopping time and Kn is our estimate 
for K{X^^) from samples Xq" . 

Let X*~ be the set of all one-sided sequences, that is, 

X*~ = {(..., x_i, Xo) : Xi E X for all — oo < i < 0}. 

Let f : X ^ (— oo, oo) be bounded, otherwise arbitrary. Define the function 

F : X*^ -^ (-0O, oo) as 



F{x'_^J = EifiX^)\Xt_ 



•^-ooJ 



E.g. if f{x) = 1{^=,} for a fixed z e X then F(|/0^) = P{Xi = z\X^^^ = 
y^oo)- If '^ is a finite or countably infinite subset of the reals and f{x) = x 
thenFiy^_J = EiX,\X'_^ = y^_J. 

One denotes the nth estimate of -E(/(Xa„+i)|Xq") from samples Xq" by /„, 
and defines it to be 

1 n— 1 
/n = -E/(^A,+l). (6) 

"' i=o 

3 Main Results 

Define the distance d*{-, ■) on X*^ as follows. For x^_^, y^^ G X*^ let 

oo 

d*ix'_^,y'-J = j:^-'-\.^.^y^.}- (7) 

i=0 
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Definition 3. We say that F{X'^^) is almost surely continuous if for some 
set C C X*^ which has probability one the function F(X°g^) restricted to 
this set C is continuous with respect to metric d*{-, ■). (Cf. Morvai and Weiss 

112J.) 

The processes with almost surely continuous conditional expectation gener- 
alizes the processes for which it is actually continuous, cf. Kalikow [9j and 
Keane [10] . The stationary finitarily Markovian processes are included in the 
class of stationary processes with almost surely continuous E{f{Xi)\X^^) 
for arbitrary bounded /(■). 

Note that Ryabko [14J, and Gyorfi, Morvai, Yakowitz [7] showed that one 
cannot estimate P(X„+i = 1|X^) for all n in a pointwise consistent way 
even for the class of all stationary and ergodic binary finitarily Markovian 
time series. 

The entropy rate H associated with a stationary finite or countably infinite al- 
phabet time series {X„} is defined as if = lim^^oo ^^ Sx" ga'"+i Pn{x'^n) I0S2 P™(^-n) 
We note that the entropy rate of a stationary finite alphabet time series is 
finite. For details cf. Cover, Thomas |3], pp. 63-64. 

Fix positive real numbers < /3, 7 < 1 such that 2/? -|- 7 < 1, fix a sequence In 
that 1 = li < I2, ■ ■ ■, In ^ 00 and fix a bounded function /(■) : X — >• (—00, cx)) 
and with these numbers, sequence and function define (n, Xn, i^n, A„ and 
F(-) as described in the previous section. For the resulting /„ we have the 
following theorem: 

THEOREM. Let {Xn} be a stationary and ergodic time series taking val- 
ues from a finite or countably infinite set X . If the conditional expectation 
F(X°qq) is almost surely continuous then almost surely, 



limfn = F{X'_J and lim /„ - E(/(Xa„+i)|X, 



n— +00 



■-0 



0. 



The In may be chosen in such a fashion that whenever the stationary and 
ergodic time series {Xn} has finite entropy rate then the A„ grow no faster 
than a polynomial in n. 

If the stationary and ergodic time series {Xn} turns out to he finitarily 
Markovian then 

lim —^ = ^- < 00 almost surely. 
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Moreover, if the stationary and ergodic time series {X„} turns out to be 
independent and identically distributed then A„ = A„_i + 1 eventually almost 
surely. 

Proof of the Theorem : 

Step 1. The time series {-^n}°=_oo ^'^^ {^n}n=-oo ho-ve identical distribu- 
tion. 

For all A; > 1 and 1 < i < k define (similarly to Morvai and Weiss [12]) 
Co = and 

Let T denote the left shift operator, that is, (Tx'^^)i = Xj+i. It is easy to 
see that if Cfc(^-oo) = ^ then C|(T'x~^) = -/. 

Now the statement follows from stationarity and the fact that for A; > 0, 
n > 0, x°_^ e A'"+\ / > 0, 

nxtn = ^-n. Ck = l} = {X°„ = a;\, a(X°^) = -/}. (8) 

Step 2. We show that P{xn = K{X^^) eventually \K{X^^) < cx)) = 1 and 
P(lim„^„oXn = oo\K{X^_J = oo) = 1. 

By Step 1, {Xn}n=-cxi is stationary and ergodic with the same distribution 
as {Xn}n=-oo- We may assume that the sample path X'^^ is such that all 
finite blocks that appear have positive probability. It is immediate that if 
K{X°_^) < oo then for all k > K{X°_^), A^ = and A^(^„ ^^ > 

(otherwise the length of the memory would be not greater than K{X^^) — 1). 

If K{X^^) = oo then A^ > for all k, (otherwise K{X^^) would be finite). 

Thus by ([3]) if K{X^^) = oo then x„ ^ oo and if K^X'^^) < oo then Xn > 

K{X^^) eventually almost surely. We have to show that Xn < -^(-^-oo) 

eventually almost surely provided that K{X^^) < oo. 

Fix now k < n. We will estimate the probability of the undesirable event as 

follows: 

P{Al>n-^,K{X'_J = k\xl,^^) 

n 

< y^ P( max l/^,,,,<rii]_i\ 



i^{\l\+k<t<n:Xl, = {X\^„x)} 



#{ [f 1 + A; - 1 < t < n - 1 : Xt 



fc+i 



^^k+i) 



i^m+k + i<t<n: XU_, = {zZl,^^,X^_,^„x)} 



#{\^]+k + i-l<t<n-l:X^ 



t-k-i+l 



-k-i+1 



^-'^-fc+l)} 



> n 



K{Xt^ 



k\x,- 



Define M.k-i as the set of all x'ifc+i G X^ such that for all i > 1, z G A", 
and yZl_i+i G X\ Pk+i{yZl.i+i, x\^-^, z) > implies that pk-ii 



z\x 



Pk+i-i{z\y_^_,-_^_-^,x'^__^_^_j^). By the definition of A^ and since K{X^^ 



k+i) — 
= k we 



have easily that 



max 



1 



{C/(fe)<rfi-i} 



i^{\^]+k<t<n:XU = iX%^„x)} 



#{[f] +A;-l<t<n-l:X 



t 

t~k+l 



#{[fl +k + i<t<n:Xl 



k—i 






i^{\^^+k + ^-l<t<n-l■.XU_ 



i+l 



l^-fc-j+l)^-fc+l)j 



> n 



-P 



K{X[. 



k\X,- 



< P{ max 

i^{\f\+k<t<n:XU = {y\^,,x)} 



#{[11 + A; - 1 < t < n - 1 : Xt,+i = y°,+J 

2 1 ' '" ' K ^ "^ ^ '" • ^'-t—k—i 



#{[11 + A; + ^ < t < n : X*_,_. = (^Zt.+i,t/^+i,:r)} 



#{[fl+A: 



An 



1 < t < n - 1 : XU_,^, = (^-t,+i,y^+i)} 



>n 1^0 , 
We can estimate this last probability as the sum of two terms: 



P( max 



„() 

»-fc+l 



#{[tl+A:<t<n:Xt, = (t/Vi,:r)} 
#{[fl + A: - 1 < t < n - 1 : X^.+i = y°,+J 

#{ [f 1 + A: + ^ < t < n : X*_fc-. = (^Zt.+i, |/° ,+i, x)} 
#{[fl + A; + ^ - 1 < t < n- 1 : X,Vm = (^-tm-^Vi)} 



> n 






< P( max 

#{rfl+A:<t<n:Xtfc = (l/Vi,x)} 

//rrm.T i^,^ -i -t^/ n 



#{rfl+A:<t<n:Xtfc = (l/Vi,x)} 
#{ [f 1 + A; - 1 < t < n - 1 : Xt,+i = y%^,} 

ran 



-Pfc-i(a:bVi) 



+ 



>0.5n~^|xd'') 
P{ max 



l<t<n 



-fc-i+nl/-fc+l'^)i 



= (^-fc-j+l)l/-fe+l)l 



We overestimate these probabilities. For any m > and x*^^ define a™(x'^^) 
as the time of the i-th ocurrence of the string x^^ in the data segment -^p2.] , 
that is, let 0"™(a;1^) = [|] + m — 1 and for i > 1 define 



that is, let 0"™(a;1^ 

>-0 ^=min{t><,(xO_J:X*_„ = xO_„}, 



<(^-„J 



Now 



Pi 



max 

#{rtl+A:<t<n:Xt, = (yVi,:r)} 
#{[§] + A; - 1 < t < n - 1 : XU^, = yO.^J 

_ T . Vt 

-■- • ^t-k-i+1 



#{[!! + fe + ^ < t < n : XU_^ = izZl,^„y%^,, ,, 

~ l^-fc-i+l; y-fe+lJj 



#{rii + A:+^ 

^ ™"" sup 



1 < t <n 



-0 „x)} 



n.fe -^ 



// LI 2 I ' ■" ' - - _ - _ 

< P{ max s 

1 -^ 



> n'^\X^ 



rti 



0.5n-^|xi 



I 2 I 



+ P{ ma> 



max 

— fc — i+l '^ — fc + 1 ' -^ n,fc + i 



sup 



(1) 

■ II- ?>"■ 



l-j 



=x} - Pk-iix\y°_f,^^] 



Since 

Pi 



■ / ^ {^ k+i-l, -fe \ 

both £„ Ij and £„ |,^j depend solely on Xq ^ we get 
max 



> O.Sn-'^lxi"^ 



W-fc+1 



#{m+k<t<n:X, 



t 



(l/°fc+l,x)} 



#{ [f 1 + A: - 1 < t < n - 1 : X^^, = y" ,^ J 



^-k-i+lyV-k+lT^Jf 



#{ [f 1 + fc + ^ - 1 < t < n - 1 : XU_^^, = {z-_U+,, y\^,)} 



> n-^lXp^ 



< 



E 



E Pi 



y\ 



+ 



E 



1 ^ 
-El 



{X k-l, % 

^=1 "r (I'-fc + l) 



=x}-Pfc-iia;|y_fc+i, 



y-k+ 



E ^( 



1 ^' 

-Ei{x 

J r=l 



fc + i — 1^ — fc \ 



= X} 



-Pk-i[x\y_k+i, 



>0.5n-f^\xl^'^). 

Each of these represents the deviation of an empirical count from its mean. 
The variables in question are independent since whenever the block y^^+i 
occurs the next term is chosen using the same distribution Pfc-i(a^|l/°fc+i)- 
Thus by Hoeffding's inequality (cf. Hoeffding [8] or Theorem 8.1 of Devroy 
et. al. [6]) for sums of bounded independent random variables and since the 
cardinality of both Cl^ { and £„ {^^ is not greater than {n + 2)/2, we have 

Pi 



max 

#{\^-]+k<t<n:Xl 



iy-k+i,^)} 



#{rfl + A; - 1 < t < n - 1 : XU^, = 1/0,+J 



k + i<t<n: X*^^, 



^-fc-i+l' V-k+l^ ^)l 



< 2 



Thus 






(^-fc-i+ij^-jt+i); 



> n-^|xi^^ 



E 2e- 



P(A^ > n"'^, ir(x': 



A:|xi^^) 



Integrating both sides we get 

P{Al>n~^,K{X'_J = k) 



The right hand side is summable provided 2/3 + 7 < 1 and the Borel-Cantelh 
Lemma yields that 

P{Kl < n-f^ eventually, K{X^_^) = k) 

Thus Xn ^ k eventually almost surely on K{X^^) = k. 
Step 3. We show the first part of the Theorem. 
Recalling (jH]) we can write 

1 71—1 -I n— 1 

fn = - E[/(^A,+i) - EifiX,^+,)\x'j^)] + -Y: E{fiX,^+,)\x'_U (9) 

Observe that the first term is an average of orthogonal bounded random 
variables and by Theorem 3.2.2 in Revesz [13], it tends to zero. 
Now we deal with the second term. If K^X'^^) < 00 then by Step 2, Xn = 
ir(X°^) eventually and by ([I]), ([2]), (g]) and Step 1, eventually, 

E{f{X,^^,)\X^J^) = E(/(X,^.+i)l^o^) = F{X'_J. 

We may deal with the case when K^X'^^) = 00 and by Step 2, Xn -^ 00. 
For arbitrary j > 0, by ([5]) and (j4]) and the construction in ([2]), 

^a;-k,+i = ^%+i and limd*{Xl^,X^^^) = almost surely. (10) 

Be Step 1, and the almost sure continuity of F{-), for some set C C X*~ 
with full measure, F(-) is continuous on C and 

X°^ e C, X!!^ G C for all n > almost surely. (11) 

By the continuity of F(-) on the set C and ^, E{f{Xx^+i)\X^'^) = 
F{X^^) -^ F(XO^) and /„ -^ F{X^J almost surely. 

Define the random neighbourhood A/^(Xq^) of Xq^ depending on the random 
data segment Xq^ itself as 
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Note that by ([I]), ([2]), © and (H, X°^ G Afj{Xo') and by (^ and the 
continuity of F(-) on the set C, and since Hj —> oo, by flTOj) . almost surely, 



lim 



E(/(X,^,+Ol^oO-^(^-oo) = lim i?{F(X^;,)|Xo"^}-F(XO 



J-^OO 



< lim sup |F(y°J-F(£j|=0. 

Step 4. VFe s/iow the second part of the Theorem. 

Now we assume that the stationary and ergodic finite or countably infinite 

alphabet time series {X^} possesses finite entropy rate H. (A stationary 

finite alphabet time series always has finite entropy rate.) 

We will in fact obtain a more precise estimate, namely, if for some < 62 < ei, 

E^i(fc + l)2-'^(^i-^2) < 00 then 

A„ < 2'"*-'^"^^i^ eventually almost surely. 

In particular, for arbitrary 5 > 0, < 62 < ei, if /„ = min (n, max fl, [ ^^^ log2 n\ 
then 



2+(5 



(H+ei) 



eventually almost surely, and the upper bound is a polynomial. 

Since An < (n, it is enough to prove the result for (n- Let X* be the set of 
all two-sided sequences, that is, 

X* = {{... , x_i, Xo,Xi, . . .) : Xi E X for all — cx) < i < 00}. 

Define Bk C A"'" as Bk = {x\+^ G X^^ : 2-'*(^+^2) < p/,_i(x'i,^.+i)}. Note 
that there is a trivial bound on the cardinality of the set B^, namely, 

\Bk\ < 2'^-(^+^2)_ (^12) 

Define the set Tk{y^k+i) &s follows: 

We will estimate the probability of Tfc(|/°;^^^) by a frequency argument. 
Let x'^^ G A"* be a typical sequence of the time series {X„}. Define 
Po{yli^+i,x^^) = and for i > 1 let 

P^iy\+l,^-J = min{/ > p..-,{y\+,,x^J : T'^x"^^ G T,(yO,^+i)}. 

11 



Define also tq^/^^i j^_-^^, x'^^) = and for i > 1 let 

r^iy\+^,x^J = min{Z > n^i{y\^,,x-J+2^^^^^^^^ : T~^x^^ G r,{y\^,)}. 

Notice that if rj_i = pm then Tj < pm+k+i- (Indeed, since there are at least 
fc + 1 occurrences of the block y'^ , ,-, in the data seErment X''™ , , i hence 

2«fe(H+ei) < _(fe(T-^™x~^) < pm+k+i - n-i.) By the ergodicity of the time 
series {Xn}, 



lim 



rt{y\+^,x^^) 



t— >oo 



™oo 

-/fc+1' -^-oc 

EL#{J > 1 ■■ ri-i{y\+r.x^J < p,{y\^,.x^J < n{y\^„x^J} 

Tt{y\+r,x^^) 

< 1-4^ = 4^- (13) 

Since 

by stationarity and the upper bound on the cardinality of the set Bk in flT2l) 
and by flT3l) . we get 

= P(-4^>2'^(^+^^),X\^iGi^fc) 

By assumption, the right hand side sums and the Borel-Cantelli Lemma 
yields that the event {Ck > 2^''^^'^'^'^\ X^i _|_^ G B^} cannot happen infinitely 
many times. By Step 1, the distribution of the time series {X„} is the same 
as the distribution of {Xn} and by the Shannon-McMillan-Breiman Theorem 
(cf. Chung |:2j) X^i^_^_^ e Bk eventually almost surely and so (k > 2''=(^+''i) 
cannot happen infinitely many times. 
Step 5. We show the rest of the Theorem. 

By Step 2, if 1 < K{X°_^) < oo then Xn = K{X^_^) eventually, and by 
ergodicity, ^ ^ Pk{x^_^)~i{X^.k{x°_^)+i) > ^- ^^ ^(^°oo) = then by Step 
2, Xn = eventually, and by (j4]), A„ = A„_i + 1 eventually. The proof of the 
Theorem is complete. 
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